Clustering with Shared Nearest Neighbor-unscented Transform Based Estimation

نویسندگان

  • M. Ravichandran
  • A. Shanmugam
چکیده

Subspace clustering developed from the group of cluster objects in all subspaces of a dataset. When clustering high dimensional objects, the accuracy and efficiency of traditional clustering algorithms are very poor, because data objects may belong to diverse clusters in different subspaces comprised of different combinations of dimensions. To overcome the above issue, we are going to implement a new technique termed Opportunistic Subspace and Estimated Clustering (OSEC) model on high Dimensional Data to improve the accuracy in the search retrieval.Still to improve the quality of clustering hubness is a mechanism related to vector-space data deliberated by the propensity of certain data points also referred to as the hubs with a miniature distance to numerous added data points in high dimensional spaces which is associated to the phenomenon of distance concentration. The performance of hubness on high dimensional data has an incapable impact on many machine learning tasks namely classification, nearest neighbor, outlier detection and clustering. Hubness is a newly unexplored problem of machine learning in high dimensional data spaces, which fails in automatically determining the number of clusters in the data. Subspace clustering discovers the efficient cluster validation but problem of hubness is not discussed effectively. To overcome clustering based hubness problem with sub spacing, high dimensionality of data employs the nearest neighbor machine learning methods. Shared Nearest Neighbor Clustering based on Unscented Transform (SNNC-UT) estimation method is developed to overcome the hubness problem with determination of cluster data. The core objective of SNNC is to find the number of cluster points such that the points within a cluster are more similar to each other than to other points in a different cluster. SNNC-UT estimates the relative density, i.e., probability density, in a nearest region and obtains a more robust definition of density. SNNC-UT handle overlapping situations based on the unscented transform and calculate the statistical distance of a random variable which undergoes a nonlinear transformation. The experimental performance of SNNC-UT and k-nearest neighbor hubness in clustering is evaluated in terms of clustering quality, distance measurement ratio, clustering time, and energy consumption.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FUZZY K-NEAREST NEIGHBOR METHOD TO CLASSIFY DATA IN A CLOSED AREA

Clustering of objects is an important area of research and application in variety of fields. In this paper we present a good technique for data clustering and application of this Technique for data clustering in a closed area. We compare this method with K-nearest neighbor and K-means.  

متن کامل

Estimation of Density using Plotless Density Estimator Criteria in Arasbaran Forest

    Sampling methods have a theoretical basis and should be operational in different forests; therefore selecting an appropriate sampling method is effective for accurate estimation of forest characteristics. The purpose of this study was to estimate the stand density (number per hectare) in Arasbaran forest using a variety of the plotless density estimators of the nearest neighbors sampling me...

متن کامل

Software Cost Estimation by a New Hybrid Model of Particle Swarm Optimization and K-Nearest Neighbor Algorithms

A successful software should be finalized with determined and predetermined cost and time. Software is a production which its approximate cost is expert workforce and professionals. The most important and approximate software cost estimation (SCE) is related to the trained workforce. Creative nature of software projects and its abstract nature make extremely cost and time of projects difficult ...

متن کامل

Improving Accuracy in Intrusion Detection Systems Using Classifier Ensemble and Clustering

Recently by developing the technology, the number of network-based servicesis increasing, and sensitive information of users is shared through the Internet.Accordingly, large-scale malicious attacks on computer networks could causesevere disruption to network services so cybersecurity turns to a major concern fornetworks. An intrusion detection system (IDS) could be cons...

متن کامل

Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model

A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are cons...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015